A Variational Analysis of Stochastic Gradient Algorithms
نویسندگان
چکیده
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuoustime stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms; we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models.
منابع مشابه
Importance Sampled Stochastic Optimization for Variational Inference
Variational inference approximates the posterior distribution of a probabilistic model with a parameterized density by maximizing a lower bound for the model evidence. Modern solutions fit a flexible approximation with stochastic gradient descent, using Monte Carlo approximation for the gradients. This enables variational inference for arbitrary differentiable probabilistic models, and conseque...
متن کاملConvergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence
Several recent works have explored stochastic gradient methods for variational inference that exploit the geometry of the variational-parameter space. However, the theoretical properties of these methods are not well-understood and these methods typically only apply to conditionallyconjugate models. We present a new stochastic method for variational inference which exploits the geometry of the ...
متن کاملFaster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
Several recent works have explored stochastic gradient methods for variational inference that exploit the geometry of the variational-parameter space. However, the theoretical properties of these methods are not well-understood and these methods typically only apply to conditionallyconjugate models. We present a new stochastic method for variational inference which exploits the geometry of the ...
متن کاملStochastic Variational Inference with Gradient Linearization
Variational inference has experienced a recent surge in popularity owing to stochastic approaches, which have yielded practical tools for a wide range of model classes. A key benefit is that stochastic variational inference obviates the tedious process of deriving analytical expressions for closed-form variable updates. Instead, one simply needs to derive the gradient of the log-posterior, whic...
متن کاملRe-using gradient computations in automatic variational inference
Automatic variational inference has recently become feasible as a scalable inference tool for probabilistic programming. The state-of-the-art algorithms are stochastic in two respects: they use stochastic gradient descent to optimize an expectation that is estimated with stochastic approximation. The core computation of such algorithms involves evaluating the loss and its automatically differen...
متن کامل